Introduction to 



quantum Fisher information 



Denes Petz0 
Alfred Renyi Institute of Mathematics, 
H-1051 Budapest, Realtanoda utca 13-15, Hungary 

O 

<N ! Catalin Guinea^ 

^p. Department of Mathematics and its Applications 

^ . Central European University, 1051 Budapest, Nador utca 9, Hungary 

Abstract 

*£h The subject of this paper is a mathematical transition from the Fisher infor- 

mation of classical statistics to the matrix formalism of quantum theory. If the 
monotonicity is the main requirement, then there are several quantum versions 
parametrized by a function. In physical applications the minimal is the most pop- 
ular. There is a one-to-one correspondence between Fisher informations (called 
also monotone metrics) and abstract covariances. The skew information and the 
X 2 -divergence are treated here as particular cases. 
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Introduction 



Parameter estimation of probability distributions is one of the most basic tasks in infor- 
mation theory, and has been generalized to quantum regime [201 E2] since the description 
of quantum measurement is essentially probabilistic. First let us have a look at the clas- 
sical Fisher information. 

Let (X,B,fi) be a probability space. If 9 = (9 1 , . . . ,9 n ) is a parameter vector in 
a neighborhood of 9q e IR n , then we should have a smooth family fig of probability 
measures with probability density fg: 



fi $ (H)= [ fe(x)dfi(x) (HeB). 
Jh 
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The Fisher information matrix at 9n is 



f d d 

J{Ho;0o)ij ■= / fe (x)T^r\ogf e (x) — \ogf e (x) dfi(x) (1) 
J x 09 e=6o o9 J 0=o o 



x 



dife (x) djf eo (x 



feo(x) 
fe (x)dij log f e (x) 



e=e dQi 

dfj,(x) 



x 



e=e 



d/i(x) 



(l<i,j<n). 



Note that logfe(x) is usually called log likelihood and its derivative is the score 
function. 

The Fisher information matrix is positive semidefinite. For example, if the parameter 
9 — (9 1 , 9 2 ) is two dimensional, then the Fisher information is a 2 x 2 matrix. From the 
Schwarz inequality 



J(fJ>e;0 o ) 



2 
12 



< 



1 



--difd (x) 



d/i(x) 



--d2fe {x) 



d/i(x) 



x 1 \/!ih, {-n 

= J(^e',9 o ) n J(^0;9 o ) 2 2 
Therefore the matrix J(/ie; ^0) is positive semidefinite. 

Assume for the sake of simplicity, that 9 is a single parameter. The random variable 
9 is an unbiased estimator for the parameter 9 if 



E, 



,(§):= J 9(x)f e (x)dfi(x) = 9 



for all 9. This means that the expectation value of the estimator is the parameter. The 
Cramer-Rao inequality 

Va lW :=W-^)>^ 

gives a lower bound for the variance of an unbiased estimator. (For more parameters we 
have an inequality between positive matrices.) 

In the quantum formalism a probability measure is replaced by a positive matrix of 
trace 1. (Its eigenvalues form a probability measure, but to determine the so-called den- 
sity matrix a basis of the eigenvectors is also deterministic.) If a parametrized family of 
density matrices Dg is given, then there is a possibility for the quantum Fisher informa- 
tion. This quantity is not unique, the possibilities are determined by linear mappings. 
The analysis of the linear mappings is the main issue of the paper. In physics 6 R 
mostly, but if it is an n-tuple, then Riemannian geometries appear. A coarse-graining 
gives a monotonicity of the Fisher informations and this is the second main subject of 
the present overview. 

Fisher information has a big literature both in the classical and in the quantum 
case. The reference of the papers is not at all complete here. The aim is to have an 
introduction. 
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1 A general quantum setting 



The Cramer-Rao inequality belongs to the basics of estimation theory in mathematical 
statistics. Its quantum analog appeared in the 1970's, see the book [20] of Helstrom 
and the book [22] of Holevo. Although both the classical Cramer-Rao inequality and 
its quantum analog are mathematically as trivial as the Schwarz inequality, the subject 
takes a lot of attention because it is located on the boundary of statistics, information 
and quantum theory. As a starting point we give a very general form of the quantum 
Cramer-Rao inequality in the simple setting of finite dimensional quantum mechanics. 
The paper [13] is followed here. 

For 9 G (—£,£) C M a statistical operator p(9) is given and the aim is to estimate 
the value of the parameter 9 close to 0. Formally p(9) is an n x n positive semidefinite 
matrix of trace 1 which describes a mixed state of a quantum mechanical system and 
we assume that p(9) is smooth (in 9). Assume that an estimation is performed by the 
measurement of a self-adjoint matrix A playing the role of an observable. A is called 
locally unbiased estimator if 

This condition holds if A is an unbiased estimator for 9, that is 

Trp(9)A = 9 {6 e (-£,£)). (3) 

To require this equality for all values of the parameter is a serious restriction on the 
observable A and we prefer to use the weaker condition (j2J). 

Let [K, L] p be an inner product (or quadratic cost function) on the linear space of 
self-adjoint matrices. This inner product depends on a density matrix and its meaning 
is not described now. When p{9) is smooth in 9, as already was assumed above, then 

d -Trp(9)B ^ „ = [B,L] m (4) 



89 



0=0 



with some L = L*. From ([2]) and (JH), we have [A, £] p (o) = 1 and the Schwarz inequality 
yields 

[AA] p(Q) > T -^ f — . (5) 

This is the celebrated inequality of Cramer-Rao type for the locally unbiased esti- 
mator. 

The right-hand-side of (E]) is independent of the estimator and provides a lower bound 
for the quadratic cost. The denominator [L, £] p (o) appears to be in the role of Fisher 
information here. We call it quantum Fisher information with respect to the cost 
function [-, - ] p (p)- This quantity depends on the tangent of the curve p{9). If the 
densities p(9) and the estimator A commute, then 

' dp(9)\ 2 



d9 J 
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Now we can see some similarity with ([I]). 

The quantum Fisher information was defined as [L,L] p ( ), where 



= L. 

8=0 



This L is unique, but the the quantum Fisher information depends on the inner product 
[• , -] p (q). This is not unique, there are several possibilities to choose a reasonable inner 
product [•, -] p (o)- Note that [A, A] p ^ should have the interpretation of "variance" (if 
Trp A = 0.) 

Another approach is due to Braunstein and Caves [I] in physics, but Nagaoka con- 
sidered a similar approach 



1.1 From classical Fisher information via measurement 

The observable A has a spectral decomposition 

k 



A = J2 X i E i 



(Actually the property Ef = Ei is not so important, only Ei > and Yli E i = I- Hence 
{Ei} can be a so-called POVM as well.) On the set X = {1, 2, . . . , k} we have probability 
distributions 

f i e ({i}) = Trp(9)E i . 



Indeed, 



Since 



we can take 



£>({«*}) = Trp(e)J2 E i = ^PW = 1- 



i=i i=i 



Tr p(9)E i rj , 

MW) = ^r7W Tr ^ 1 



p({i}) = TrDE l 
where D is a statistical operator. Then 



TrDEi 

and we have the classical Fisher information defined in ([T]): 



Tr p{d)E l 

/»({»}) = "TKTTTc^ ( 6 ) 



Trp(e)Ei 



Tr p{9YE t Tr p(9)E 



Tr D Ei Tr DR. ' TrDR. ' 



Tr Dfi Tr L>£ 



Trp(0)^ 
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(This does not depend 011D.) In the paper j3] the notation 

F( P (e),{Em-j Trp(W0 # 

is used, this is an integral form, and for Braunstein and Caves the quantum Fisher 
information is the supremum of these classical Fisher informations jl] . 

Theorem 1.1 Assume that D is a positive definite density matrix, B = B* and Tr B = 
0. If p(8) = D + OB + o(0 2 ), then the supremum of 



over the measurements A = Yli=i ^ s 

TrBI^iB), where J D C = (DC + CD)/2. (8) 

Proof: The linear mapping J D is invertible, so we can replace B in ([7]) by Jbj(C). We 
have to show 

[Tr J D (C)E,] 2 

jl V (TrCDE,) 2 + (Tr£>Cff t ) 2 + 2(Tr C£E,)(Tr DCEA 2 
4^ TrD^ - 1 

i 

This follows from 

(Tr CDEi) 2 = (jT(E} /2 CD l/2 )(D l/2 E} /2 )y 

< Tr EiCDC Tr D l/2 E l D l/2 = Tr EiCDC Tr Di?;. 

and 

(Tr DCEi) 2 = {Ti^D^D^CE] 12 )^ 2 

< Tr D 1/2 E { D 1/2 Tr D l/2 CE i CD 1 ' 2 = Tr EiCDC TrDEi. 

So F(p(0); {Ei}) < Tr DC 2 holds for any measurement {Ei}. 

Next we want to analyze the condition for equality. Let J~^B = C = ^ k XkPk be the 
spectral decomposition. In the Scwarz inequalities the condition of equality is 

D 1 I 2 E}J 2 = Cl D^ 2 CE\ 12 

which is 

E\' 2 = Cl CE\' 2 . 
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So E^ 2 < Pju\ for a spectral projection PjU). This implies that all projections Pj are 
the sums of certain E^s. (The simplest measurement for equality corresponds to the 
observable C.) □ 

Note that J^ 1 is in Example [TJ It is an exercise to show that for 



D 



r 
1 -r 



the optimal observable is 



C 



r a 

r 

2b 



B 

2b 
a 



a b 
b —a 



1 — r 



The quantum Fisher information (jSJ) is a particular case of the general approach of 
the previous session, Sd is in Example [1] below, this is the minimal quantum Fisher 
information which is also called SLD Fisher information. The inequality between (J7|) 
and (jHJ) is a particular case of the monotonicity, see [10| W2\ and Theorem 11.21 below. 



If D = Diag (Ai, . . . , A n ), then 



F min {D;B) :=TiBS D \B) = J2 



Xi + Xj 



In particularly, 



F mm (D ] i[D,X]) = J2 



2(Xj - Xj) 2 , 2 
Xi + A,- 



and for commuting D and B we have 

F min (D;B) = TrD- 1 B 2 . 

The minimal quantum Fisher information corresponds to the inner product 
[A,B] P = ±Ty P {AB + BA) = TyA$ p (B). 



Assume now that 9 



1\8 2 ). The formula © is still true. If 

dip(9) = B u 



then the classical Fisher information matrix F(p(0); {Ek})ij has the entries 

t?i m\ rz^iN s-^Tr BiE k Tr BjE k 

^(p(0);W)o- = E TTp{0)Ek 

and the quantum Fisher information matrix is 

TrBiJ^Bi) Ti BJ D \B 2 



(9) 



(10) 
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Is there any inequality between the two matrices? 

Let (3(A) = 'Y^ lk E] t AEj t . This is a completely positive trace preserving mapping. In 
the terminology of Theorem 12.31 the matrix (fIU|) is J\ and 

J 2 = F(p(0);{E k }). 

The theorem states the inequality J 2 < J\. 



1.2 The linear mapping Xd 

Let D G M n be a positive invertible matrix. The linear mapping j{, : M n — > M n is 
defined by the formula 

Jj£ = f(L D R^)R D , 

where / : R + R+, 

L D (X) = DX and R D (X) = XD . 
(The operator LpR^ 1 appeared in the modular theory of von Neumann algebras.) 

Lemma 1.1 Assume that f : R + — > IR + is continuous and D = Diag (Ai, A2, A n ). 
Then 



A, 



Moreover, i/^ < / 2; then < j£ < jg. 
Proof: Let /(x) = x fc . Then 



J^S = D k BD 1 - k 



and 



^D 5 )ii = \ fc A] = Aj/ Bij. 



This is true for polynomials and for any continuous / by approximation. □ 
It follows from the lemma that 

(AJ f D B) = (B*J f D A*) (11) 

if and only if 

which means xf(x r ) = f(x). Condition (ITTjl is equivalent to the property that (X, J D Y) G 
R when X and K are self-adjoint. 

The functions / : R + — >■ R + used here are the standard operator monotone 
functions defined as 



(i) if for positive matrices A < B, then f(A) < f(B), 

(ii) xfix- 1 ) = f(x) and /(l) = 1. 

These functions are between the arithmetic and harmonic means [27J EI] : 

2x 1 + x 

< f(x) < — — . 



x + 1 

Given /, 

m f (x,y) = yf f 

.y 



is the corresponding mean and we have 

g f D B) ij = m f (X ii X j )B ij . (12) 

Hence 

S f D B = XoB 

is a Hadamard product with Xij = mf(\i,\j). Therefore the linear mapping T D is 
positivity preserving if and only if the above X is positive. 

The inverse of J D is the mapping 

which acts as B h-> Y o B with = l/m/(Aj, Xj). So is positivity preserving if 

and only if Y is positive. 

A necessary condition for the positivity of J D is f(x) < -^/x, while the necessary 
condition for the positivity of (J^) -1 is f(x) > \fx. So only f{x) = y/x is the function 
which can make both mappings positivity preserving. 

Example 1 If f(x) = (x + l)/2 (arithmetic mean), then 

1 f°° 

J D B = -(DB + BD) and S^B = exp(-tD/2)Bexp(-tD/2)dt. 

2 ^ Jo 

This is from the solution of the equation DB + BD = 2B. □ 
Example 2 If f(x) =2x/(x + 1) (harmonic mean), then 

POO 

B= exp(-tD- l /2)Bexp(-tD- 1 /2)dt 
Jo 

1 



and 



r^B = -{D- l B + BD- 



2 

This function / is the minimal and it generates the maximal Fisher information which 
is also called right information matrix. □ 
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x — 1 
logx 



Example 3 For the logarithmic mean 
we have 

/*1 poo 

1 D {B)= j D t BD 1 ~ t dt and J D \B) = / (D + t)~ 1 B(D + t) _1 dt 
Jo ' Jo 

This function induces an importan Fisher information. 
Example 4 For the geometric mean f(x) = ^/x and 

S D (B) = D^BD 1 ' 2 and J^(B) = D^BD^ 2 . 



(13) 



□ 



□ 



J D is the largest if / is the largest which is described in Example [T] and the smallest 
is in Example |2j 

Theorem 1.2 Let (5 : M n — > M m be a completely positive trace preserving mapping and 
f : M + — > M + be a matrix monotone function. Then 



>D) 



and 



(14) 
(15) 



Actually (fl4"|) and (|T5|) are equivalent and they are equivalent to the matrix mono- 
tonicity of / 



In the rest / is always assumed to be a standard matrix monotone function. Then 
Tr J D fi = Tr DB. 

Example 5 Here we want to study when D can have eigenvalues. Formula ( Il2p 
makes sense. For example, if D = Diag (0, A, A, /i) (A, \i > 0, A ^ //), then 



v D 



B 






m(0,X)B 12 


m(0,fi)B 13 


m(0,/j l )B u 


(0,A)£? 21 


XB22 


m(\,n)B 2 3 


m(A,/i)5 24 


(0,/i)S 3 i 


m(A,/i)5 32 


f^B u 






m(\,[i)B 42 


//-B43 





If /(0) > 0, then this matrix has only one entry. If /(0) = 0, then 



} f D B 





XB 2 2 m(A,/i)-B 2 3 ^(A,/i)-B 24 

m(\,n)B 32 fiB u fxB u 

m(\,n)B i2 I^B i3 fxB 43 
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and the kernel of Jd is larger. We have 

(BJ f D B)=J2^f(K^)\B K 12 



and some terms can be if D is not invertible. 
The inverse of ^ D exists in the generalized sense 

1 



Bij if m/(Ai, Xj) 7^ 0, 



[(jr{,)- x B]« = I ™ffr>*i) 

[o if m/(A i ,A i )=0. 

(This is the Moore-Penrose generalized inverse.) □ 
It would be interesting to compare the functions which non-zero at with the others. 

2 Fisher information and covariance 

Assume that / is a standard matrix monotone function. The operators j{, are used to 
define Fisher information and the covariance. (The latter can be called also quadratic 
cost.) The operator y D depends on the function /, but / will be not written sometimes. 

Let A = A*, B = B* 6 M„ be observables and D G M n be a density matrix. The 
covariance of A and B is 

Cav f D (A,B) := (AJ f D (B)) - (Tr DA)(Tr DB). (16) 

Since 

Cov^ ( A, A) = ( ( A - /Tr D A) , J D ( A - /Tr D A) 
and Jd > 0, we have for the variance Var{,(A) := Cov{>(v4, A) > 0. 
The monotonicity (fT5|) gives 

Var£(/3M) < Var^(A). 
for a completely positive trace preserving mapping /3. 

The usual symmetrized covariance corresponds to the function f(t) — (t+ l)/2: 

Cov D (A,5) := ^Tr (.0(^*5 + 5A*)) - (Tr DA*)(Tr DB). 

Let Ai,A 2 , . . . , Ak be self-adjoint matrices and let D be a statistical operator. The 
covariance is a x matrix C(D) defined as 

C^^Cov^A,^)- (17) 
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C(D) is a positive semidefinite matrix and positive definite if the observables A±, A2, . . . ,Ak 
are linearly independent. It should be remarked that this matrix is only a formal analogue 
of the classical covariance matrix and it is not related to a single quantum measurement 

The variance is defined by Sd and the Fisher information is formulated by the inverse 
of this mapping: 

lD (A,B) = TiAr D l (B*). (18) 

Here A and B are self-adjoint. If A and B are considered as tangent vectors at the 
footpoint D, then Tr A = Tr B = 0. In this approach 7£>(A B) is a an inner product in 
a Riemannian geometry [21 [21]. It seems that this approach is not popular in quantum 
theory. It happens also that the condition Tr D = 1 is neglected and only D > 0. Then 
formula f|T8|) can be extended [26J. 

If DA = AD for a self-adjoint matrix A, then 

1d (A,A)=TtD- 1 A 2 

does not depend on the function /. (The dependence is characteristic on the orthogonal 
complement, this will come later.) 

Theorem 2.1 Assume that (A,B) 1— > 7d(A B) is an inner product for A,B G M n; 
for positive definite density matrix D e M n and for every n. Suppose the following 
properties: 

(i) For commuting D and A = A* we have 7d(A A) = D~ l A 2 . 

(ii) If (3 : M n — > M m is a completely positive trace preserving mapping, then 

lm (/3(A),P(A))< lD (A,A). (19) 

(Hi) If A = A* and B = B* , then / -f[ ) (A,B) is a real number, 
(iv) D 1 — y , j£,(A,B) is continuous. 



Then 

lD (A,B) = (A,Q f D )- 1 B) (20) 
for a standard matrix monoton function f. 

Example 6 In quantum statistical mechanics, perturbation of a density matrix appears. 
Suppose that D = e H and A = A* is the perturbation 

e H+tA 
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The quantum analog of formula (1) would be 



-TrDoJ^logA 



t=o 



A simple computation gives 



/ Tr e sH Ae {1 ~ s)H A ds - (Tr DA) 2 
Jo 



This is a kind of variance. 



□ 



Let Ai := {Dq : 9 G G} be a smooth m-dimensional manifold of n x n density 
matrices. Formally G C M m is an open set including 0. If 9 G G, then 9 = (9i, $2, ■ ■ • , 9 m ). 
The Riemannian structure on Ai is given by the inner product (fl8|) of the tangent 
vectors /I and B at the foot point D G A4, where Jd : M n — > M n is a positive mapping 
when M n is regarded as a Hilbert space with the Hilbert-Schmidt inner product. (This 
means Tr AJ D (A)* > 0.) 

Assume that a collection A = {A%, . . . , A m ) of self-adjoint matrices is used to estimate 
the true value of 9. The expectation value of A* with respect to the density matrix D is 
TrDAi. A is an unbiased estimator if 

Tr D g Ai = 6i (l<i<n). (21) 

(In many cases unbiased estimator A = (Ai, . . . , A m ) does not exist, therefore a weaker 
condition is more useful.) 

The Fisher information matrix of the estimator A is a positive definite matrix 
J{D) ij = TTL i J D {L j ), where U = r D \d t D e ). 

Both C(D) and J{D) depend on the actual state D. 

The next theorem is the the Cramer-Rao inequality for matrices. The point is 
that the right-hand-side does not depend on the estimators. 

Theorem 2.2 Let A = (Ai, . . . , A m ) be an unbiased estimator of 9. Then for the above 
defined matrices the inequality 

C(D e ) > J(De)- 1 

holds. 



Proof: In the proof the block-matrix method is used and we restrict ourselves for 
m = 2 for the sake of simplicity and assume that 9 = 0. Instead of D we write D. 
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The matrices Ai,A 2 ,Li,L 2 are considered as vectors and from the inner product 
(A,B) = Tr AJd(B)* we have the positive matrix 



X :-- 



Tr AJniAj Tr AJ D (A 2 ) TrAJ^) Tr AJ D (L 2 ) 

Tr A 2 3 D (A 1 ) Tr A 2 ]5 D (A 2 ) Tr A 2 J D {L{) Tr A 2 ]5 D (L 2 ) 

Tr LJ D (Ai) TrLJ D (A 2 ) Tr LJ D {L{) TrL x J D (L 2 ) 

Tr L 2 § d{A\) TrL 2 JJ D (A 2 ) Tr L 2 Jd{Li) Tr L 2 JJ d (L 2 ) 



From the condition (EH), we have 







for i — 1, 2 and 



TrA i J D (L i ) = —TrD e A % = l 



d 

Tr AlDiLj) = —TrDgAi = 



if i ^ j. Hence the matrix X has the form 

~C{D) 



dd 3 



h 
J(D) 



where 



and 



C(D) 



J(D) 



'Tr A 1 S D (A 1 ) TrAJ D (A 2 ) 

TrAJ D (A 1 ) Tr A 2 I D (A 2 ) 

TrLJoiL^ TrLJ D (L 2 )' 

TrLaJo^O TrL 2 J D (L 2 ) 



The positivity of (122!) implies the statement of the theorem. 
We have have the orthogonal decomposition 

{B = B* : [D, B] = 0} © {i[D, A] : A = A*} 



(22) 



□ 



(23) 



of the self-adjoint matrices and we denote the two subspaces by M.d and Ai c D , respec- 
tively. 

Example 7 The Fisher information and the covariance are easily handled if D is diag- 
onal, D = Diag (Ai, . . . , A n ) or formulated by the matrix units E(ij) 



D 
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The general formulas in case of diagonal D are 



id(a, a) = E wTXTaT 1 ^' 2 ' CoVc ^ = E V(V*i)IA 
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Moreover, 

7£<Mi[AJn) = Eg^i*.r- w 

Hence for diagonal D all Fisher informations have simple explicit formula. 

The description of the commutators is more convenient if the eigenvalues are different. 
Let 

S 1 (tj) := E(ij) + E(ji), S 2 {ij) := -iE(ij) + iE(ji) 
for i < j. (They are the generalization of the Pauli matrices 0\ and a 2 -) We have 

i[D, S&j)] = (Aj - Xj)S 2 (ij), i[D, S 2 (ij)} = (Xj - XjS^ij). 

In Example [T] we have f(x) = (1 +x)/2. This gives the minimal Fisher information 
described in Theorem 11.11 

POD 

j D (A,B)= / TrAexp(-tD/2)Bexp(-tD/2)dt. 
Jo 

The corresponding covariance is the symmetrized Gov d(A, B). This is maximal among 
the variances. 

From Example [2] we have the maximal Fisher information 

lD (A,B) = ^TrD- 1 (AB + BA) 
The corresponding covariance is a bit similar to the minimal Fisher information: 

POO 

Cov D (A, B)= Tr A expf-tZT 1 /2)B exp(-^ _1 /2) dt - Tr DA Tr DB. 
Jo 

Example [3] leads to the Boguliubov-Kubo-Mori inner product as Fisher information 
HI 12]: 

POO 

7d(A,B)= TiA{D + t)~ 1 B{D + ty l dt 
Jo 

It is also called BKM Fisher information, the characterization is in the paper [14] and it is 
also proven that this gives a large deviation bound of consistent superefficient estimators 

m\- □ 

Let M. := {p{0) : 9 G G} be a smooth k- dimensional manifold of invertible density 
matrices. The quantum score operators (or logarithmic derivatives) are defined as 

L{(0)--=^ f m r 1 (deAO)) {l<i<m) (25) 

and 

J(0) y := Tr 14(6)3^(^(6)) = Tr(ll {e) )-\d ei p(6))(d ejP (6)) (1 < i,j < k) (26) 
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is the quantum Fisher information matrix (depending on the function /). The 
function f(x) — (x + l)/2 yields the symmetric logarithmic derivative (SLD) Fisher 
information. 



Theorem 2.3 Let (3 : M„, — > M m be a completely positive trace preserving mapping and 
let M. := {p{9) G M n : 6 G G} be a smooth k- dimensional manifold of invertible density 
matrices. For the Fisher information matrix J\{9) of Ai and for Fisher information 
matrix J2{6) of f3(Ai) := {/3(p(6*)) : 6 G G} we have the monotonicity relation 

j 2 (d) < j x (0). 

Proof: We set B^O) := d 6i p{6). Then I^L^^B^B)) is the score operator of 0(M) 
and we have 



ij 



i j 
\ i j 



lijOi Ctj , 



ij 



where ffT^j) was used. □ 

The monotonicity of the Fisher information matrix in some particular cases appeared 
already in the literature: [3H] treated the case of the Kubo-Mori inner product and [1] 
considered the symmetric logarithmic derivative and measurement in the role of coarse 
graining. 

Example 8 The function 

/»M = W- V-W-'-i) <27) 

is operator monotone if < (3 < 2. Formally f(l) is not defined, but as a limit it is 1. 
The property xf{x~ l ) = f(x) also holds. Therefore this function determines a Fisher 
information [39]. If (3 = 1/2, then the variance has a simple formula: 

Var D A = ^Tr D 1/2 (D 1/2 A + AD 1/2 )A - (Tr DA) 2 . 

□ 
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Example 9 The functions x a and x a 1 are matrix monotone decreasing and so is their 
sum. Therefore 

fa(x) = 7 

v ' X~ a + X a ~ l 

is a standard operator monotone function. 

jf(p, P) = 1 + Tr (p - a)a- a (p - a)^ 1 ) 

may remind us to the abstract Fisher information, however now p and a are positive 
definite density matrices. In the paper 



xl(p,a)=TT ( p -a)a- a {p-a)a a - x ) 

is called quantum ^-divergence, (if p anc } a commute, then the formula is inde- 
pendent of a. ) Up to the constant 1, this is an interesting and important particular 
case of the monotone metric. The general theory f JT9|) implies the monotonicity of the 
X 2 -divergence. □ 



3 Extended monotone metrics 

As an extension of the papers [SJ HO] Kuamagai made the following generalization 
Now H£ denotes the strictly positive matrices in M n . Formally K P (A, B) e C is defined 
for all p G H+, A,B G M n and n G N and it is assumed that 

(i) (A, B) i — y K p (A, B) is an inner product on M n for every p G and n G N. 

(ii) p i — y K p (A,B) is continuous. 

(iii) For a trace-preserving completely positive mapping j3 

K m (f3(A),(3(A))<K p (A,A) 

holds. 

In the paper [26] such K P (A, B) is called extended monotone metric and the descrip- 
tion is 

K p (A, B) = 6(Tr p)Tr A*Ti B + c(A, (^(B)), 
where / : R + -> R + is matrix monotone, /(l) = 1, b : R + —y R + and c> 0. Note that 

(A,B) ^ b(Tr p)Tr A*Tt B and (A, B) ^ c(A, (Jj£) _1 £> 

satisfy conditions (ii) and (iii) with constant c > 0. The essential point is to check 

6(Trp)TrA*TrA + c(A,(J^)- 1 A) > 0. 
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In the case of 1 x 1 matrices this is 



b(x)\z\ 2 + -\z\ 2 > 

X 

which gives the condition xb(x) + c > 0. If this is true, then 



> -c 



The positivity is the inequality 



J 

>_ c |^^,|% c ^ A ,j^_i_i A , 2 
E^r+ME^IET-i- 4 



\Ai4 



2 



i.l I 



1 2 



5> Ex:l^l 2 ^|E^f 

\ i / i i 

which is a consequence of the Schwarz inequality. 

4 Skew information 

The Wigner-Yanase- Dyson skew information is the quantity 

I P (D, A) := -^Tr [D p , A] [D 1 ^, A] (0 < p < 1). 

Actually, the case p — 1/2 is due to Wigner and Yanase [U] and the extension was 
proposed by Dyson. The convexity of I p (D,A) in A is a famous result of Lieb 



It was observed in [39] that the Wigner-Yanase- Dyson skew information is connected 
to the Fisher information which corresponds to the function f )28|) . For this function we 
have 

lD (i[D,A),i[D,A)) = \_ TrO/,^ 1 -* A]). (28) 

Apart from a constant factor this expression is the skew information proposed by Wigner 
and Yanase [47] . In the limiting cases p — > or 1 we have the function (fT3|) corresponding 
to the Kubo-Mori-Boguliubov case. 



Let / be a standard function and A = A* 6 M n . The quantity 

I'dIA) :=^l f D (i[D,A],i[D,A]) 



17 



was called skew information in [16] in this general setting. So the skew information is 
nothing else but the Fisher information restricted to Ai c D , but it is parametrized by the 
commutator. Skew information appeared twenty years before the concept of quantum 
Fisher information. Skew information appears in a rather big literature, for example, 
connection with uncertainty relations [31 QUI El [131 E51 Ell E2]- 



If D = Diag (Ai, . . . , A n ) is diagonal, then 



This implies that the identity 

I f D (A) = Cov D (A, A) - Cov£(4 A) (29) 

holds if Tr DA = and 

/>):=^(* + l)-(*-l) 2 ^). (30) 

It was proved in [S] that for a standard function / : M + — >■ R, / is standard as well. 
Another proof is in [15] which contains the following theorem. 

Theorem 4.1 Assume that X = X* e M. and TiDX = 0. If f is a standard function 
such that /(0) 7^ 0, then 

-^-S F {D + U[D, X],D + si[D, X]) = f(0h f D (i[D, X],i[D, X]) 

UtOS t=s=0 

for the standard function F = f . 

All skew informations are obtained from an /-divergence (or quasi-entropy) by differ- 
entiation. 

Example 10 The function 

f(x) = (i±v^) 2 (31) 

gives the Wigner-Yanase skew information 



I WY (D, A) = I 1/2 (D, A) = -l-Tr [D 1 / 2 , A} 2 . 



1/2,-,-, 2 

The skew information coming from the minimal Fisher information and it is often 
denoted as I SLD (D, A). The simple mean inequalities 

1 + y/x\ 2 < 1 + x < ( 1 + 2 



imply 

I WY (D, A) < I SLD (D, A) < 2I WY (D, A). 

□ 
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